A Job Self-scheduling Policy for HPC Infrastructures
نویسندگان
چکیده
The number of distributed high performance computing architectures has increased exponentially these last years. Thus, systems composed by several computational resources provided by different Research centers and Universities have become very popular. Job scheduling policies have been adapted to these new scenarios in which several independent resources have to be managed. New policies have been designed to take into account issues like multi-cluster environments, heterogeneous systems and the geographical distribution of the resources. Several centralized scheduling solutions have been proposed in the literature for these environments, such as centralized schedulers, centralized queues and global controllers. These approaches use a unique scheduling entity responsible for scheduling all the jobs that are submitted by the users. In this paper we propose the usage of self-scheduling techniques for dispatching the jobs that are submitted to a set of distributed computational hosts that are managed by independent schedulers (such as MOAB or LoadLeveler). It is a non-centralized and job-guided scheduling policy whose main goal is to optimize the job wait time. Thus, the scheduling decisions are done independently for each job instead of using a global policy where all the jobs are considered. On top of this, as a part of the proposed solution, we also demonstrate how the usage of job wait time prediction techniques can substantially improve the performance obtained in the described architecture.
منابع مشابه
Self-tuning job scheduling strategies for the resource management of HPC systems and computational grids
In this thesis we develop and study self-tuning job schedulers for resource management systems. Such schedulers search for the best solution among the available scheduling alternatives in order to improve the performance of static schedulers. In two domains of real world job scheduling this concept is implemented. First of all, we study the scheduling in resource management software for high pe...
متن کاملJob-Guided Scheduling Strategies for Multi-Site HPC Infrastructures
From the early eighties HPC architectures have evolved from single processor machines to very sophisticated architectures such as multi-cluster systems composed of heterogeneous nodes. Commonly, access to these systems has been controlled by batch systems which schedule and manage the jobs that users submit. As is shown in figure 1.1, these scheduling systems (henceforth referred to as ”local s...
متن کاملBSLD Threshold Driven Parallel Job Scheduling for Energy Efficient HPC centers
Recently, power awareness in high performance computing (HPC) community has increased significantly. While CPU power reduction of HPC applications using Dynamic Voltage Frequency Scaling (DVFS) has been explored thoroughly, CPU power management for large scale parallel systems at system level has left unexplored. In this paper we propose a power-aware parallel job scheduler assuming DVFS enable...
متن کاملParallel job scheduling for power constrained HPC systems
Power has become the primary constraint in high performance computing. Traditionally, parallel job scheduling policies have been designed to improve certain job performance metrics when scheduling parallel workloads on a system with a given number of processors. The available number of processors is not anymore the only limitation in parallel job scheduling. The recent increase in processor pow...
متن کاملThe Self-Tuning dynP Job-Scheduler
In modern resource management systems for supercomputers and HPC-clusters the job-scheduler plays a major role in improving the performance and usability of the system. The performance of the used scheduling policies (e.g. FCFS, SJF, LJF) depends on the characteristics of the queued jobs. Hence we developed the dynP scheduler family. The basic idea was to change between different scheduling pol...
متن کامل